Analysis of a Random Forests Model
نویسنده
چکیده
Random forests are a scheme proposed by Leo Breiman in the 2000’s for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there has been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm. In this paper, we offer an in-depth analysis of a random forests model suggested by Breiman in [12], which is very close to the original algorithm. We show in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present. Index Terms — Random forests, randomization, sparsity, dimension reduction, consistency, rate of convergence. 2010 Mathematics Subject Classification: 62G05, 62G20. Research partially supported by the French National Research Agency under grant ANR-09-BLAN-0051-02 “CLARA”. Research carried out within the INRIA project “CLASSIC” hosted by Ecole Normale Supérieure and CNRS. 1 ar X iv :1 00 5. 02 08 v3 [ st at .M L ] 2 6 M ar 2 01 2
منابع مشابه
Random forests algorithm in podiform chromite prospectivity mapping in Dolatabad area, SE Iran
The Dolatabad area located in SE Iran is a well-endowed terrain owning several chromite mineralized zones. These chromite ore bodies are all hosted in a colored mélange complex zone comprising harzburgite, dunite, and pyroxenite. These deposits are irregular in shape, and are distributed as small lenses along colored mélange zones. The area has a great potential for discovering further chromite...
متن کاملComparison of Tourism Placement and Development Models from Land Use Planning perspective in Zagros Forests Case Study: Javanrud County
While in recent years, due to numerous reasons, the amount of travel and tourism has increased, the amount of problems caused by this activity is also considered by managers. By using presence points of tourists in Javanrud County, Analytic hierarchy process (AHP) and Random Forest (RF) models, the conditions of establishment of tourists from the aspect of land use planning was investigated. In...
متن کاملComparison of Survival Forests in Analyzing First Birth Interval
Background and objectives: Application of statistical machine learning methods such as ensemble based approaches in survival analysis has been received considerable interest over the past decades in time-to-event data sets. One of these practical methods is survival forests which have been developed in a variety of contexts due to their high precision, non-parametric and non-linear nature. This...
متن کاملComparison of Random Survival Forests for Competing Risks and Regression Models in Determining Mortality Risk Factors in Breast Cancer Patients in Mahdieh Center, Hamedan, Iran
Introduction: Breast cancer is one of the most common cancers among women worldwide. Patients with cancer may die due to disease progression or other types of events. These different event types are called competing risks. This study aimed to determine the factors affecting the survival of patients with breast cancer using three different approaches: cause-specific hazards regression, subdistri...
متن کاملRandom forests for survival analysis using maximally selected rank statistics
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption is not always fulfilled. An alternative approach is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistics, which favors splitting variables with many possible sp...
متن کاملPathway analysis using random forests with bivariate node-split for survival outcomes
MOTIVATION There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are iden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 13 شماره
صفحات -
تاریخ انتشار 2012